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Sparse  Representation  for  Color  Image 

Restoration 

Julien  Mairal,1*  Michael  Elad,2  and  Guillermo  Sapiro3 


Abstract 

Sparse  representations  of  signals  have  drawn  considerable  interest  in  recent  years.  The  assumption 
that  natural  signals,  such  as  images,  admit  a  sparse  decomposition  over  a  redundant  dictionary  leads 
to  efficient  algorithms  for  handling  such  sources  of  data.  In  particular,  the  design  of  well  adapted 
dictionaries  for  images  has  been  a  major  challenge.  The  K-SVD  has  been  recently  proposed  for  this 
task  [1],  and  shown  to  perform  very  well  for  various  gray-scale  image  processing  tasks.  In  this  paper 
we  address  the  problem  of  learning  dictionaries  for  color  images  and  extend  the  K-SVD-based  gray¬ 
scale  image  denoising  algorithm  that  appears  in  [2].  This  work  puts  forward  ways  for  handling  non- 
homogeneous  noise  and  missing  information,  paving  the  way  to  state-of-the-art  results  in  applications 
such  as  color  image  denoising,  demosaicing,  and  inpainting,  as  demonstrated  in  this  paper. 

EDICS  Category:  COL-COLR  (Color  processing) 

I.  Introduction 

In  signal  and  image  processing  we  often  impose  an  arbitrary  model  to  describe  the  data  source. 
Such  a  model  becomes  paramount  when  developing  algorithms  for  processing  these  signals.  In 
this  context,  Markov-Random-Field  (MRF),  Principal-Component-Analysis  (PCA),  and  other 
related  techniques  are  popular  and  often  used. 
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The  Sparseland  model  is  an  emerging  and  powerful  method  to  describe  signals  based  on  the 
sparsity  and  redundancy  of  their  representations  [3],  [2].  For  signals  from  a  class  T  C  Mn,  this 
model  suggests  the  existence  of  a  specific  dictionary  (i.e.,  a  matrix)  D  £  W‘xk  which  contains 
k  prototype  signals,  also  referred  to  as  atoms.  The  model  assumes  that  for  any  signal  x  £  T, 
there  exists  a  sparse  linear  combination  of  atoms  from  D  that  approximates  it  well.  Put  more 
formally,  this  reads 

Vx  £  T,  3a  £  such  that  x  fa  Da  and  ||a||0  -C  n. 

The  notation  ||  •  ||0  is  the  f°-norm  which  counts  the  number  of  non-zero  elements  in  a  vector. 
We  typically  assume  that  k  >  n,  implying  that  the  dictionary  D  is  redundant  in  describing  x. 

If  we  consider  the  case  where  T  is  the  set  of  natural  images,  dictionaries  such  as  wavelets 
of  various  sorts  [4],  curvelets  [5],  [6],  contourlets  [7],  [8],  wedgelets  [9],  bandlets  [10],  [11], 
and  steerable  wavelets  [12],  [13],  are  all  attempts  to  design  dictionaries  that  fulfill  the  above 
model  assumption.  Indeed,  these  various  transforms  have  led  to  highly  effective  algorithms  in 
many  applications  in  image  processing,  such  as  compression  [14],  denoising  [15],  [16],  [17], 
[18],  inpainting  [19],  and  more.  Common  to  all  these  pre-defined  dictionaries  is  their  analytical 
nature,  and  their  reliance  on  the  geometrical  nature  of  natural  images,  especially  piece-wise 
smooth  ones. 

In  [1],  the  authors  introduce  the  K-SVD  algorithm,  a  way  to  learn  a  dictionary,  instead  of 
exploiting  pre-defined  ones  as  described  above,  that  leads  to  sparse  representations  on  training 
signals  drawn  from  T.  This  algorithm  uses  either  Orthogonal  Matching  Pursuit  (OMP)  [20],  [21], 
[22],  or  Basis  Pursuit  (BP),  [23],  as  part  of  its  iterative  procedure  for  learning  the  dictionary. 
The  follow-up  work  reported  in  [3],  [2]  proposes  a  novel  and  highly  effective  image  denoising 
algorithm  for  the  removal  of  additive  white  Gaussian  noise  with  gray-scale  images.  Their  pro¬ 
posed  method  includes  the  use  of  the  K-SVD  for  learning  the  dictionary  from  the  noisy  image 
directly. 

In  this  paper,  our  main  aim  is  to  extend  the  algorithm  reported  in  [2]  to  color  images  (and 
to  vector-valued  images  in  general),  and  then  show  the  applicability  of  this  extension  to  other 
inverse  problems  in  color  image  processing.  The  extension  to  color  can  be  easily  performed  by  a 
simple  concatenation  of  the  RGB  values  to  a  single  vector  and  training  on  those  directly,  which 
gives  already  better  results  than  denoising  each  channel  separately.  However,  such  a  process 
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produces  false  colors  and  artifacts,  which  are  typically  encountered  in  color  image  processing. 
The  first  part  of  this  work  presents  a  method  to  overcome  these  artifacts,  by  adapting  the  OMP 
inner-product  (metric)  definition. 

This  paper  also  describes  an  extension  of  the  denoising  algorithm  to  the  proper  handling  of 
non-homogeneous  noise.  This  development  is  crucial  in  cases  of  missing  values,  such  as  in  the 
color  image  demosaicing  and  the  inpainting  problems.  Treating  the  missing  values  as  corrupted 
by  a  strong  (impulse)  noise,  our  general  setting  fits  both  problems.  We  demonstrate  the  success 
of  the  proposed  scheme  in  demosaicing  and  inpainting,  as  well  as  color  denoising  applications, 
exhibiting  in  all  cases  state-of-the-art  results.  We  also  show  that  interacting  between  the  learning 
and  the  restoration  further  improves  the  color  image  processing  results. 

This  paper  is  organized  as  follows  -  In  Section  2  we  describe  prior  art  on  example-based  image 
denoising  (and  general  image  processing)  methods.  Section  3  is  devoted  to  a  brief  description  of 
the  K-SVD-based  gray-scale  image  denoising  algorithm  as  proposed  in  [2],  Section  4  describes 
the  novelties  offered  in  this  paper  -  the  extension  to  color  images,  the  handling  of  color  artifacts, 
and  finally  the  treatment  of  non-homogeneous  noise,  along  with  its  relation  to  demosaicing  and 
inpainting.  In  Section  5  we  provide  various  experiments  that  demonstrate  the  effectiveness  of  the 
proposed  algorithms.  Section  6  concludes  this  paper  with  a  brief  description  of  its  contributions 
and  a  list  of  open  questions  for  future  work,  including  preliminary  discussion  and  results  on 
how  to  extend  the  above  to  a  multiscale  algorithm. 

II.  Prior  art 

Many  problems  in  image  processing  and  computer  vision  are  in  a  dire  need  for  prior  models 
of  the  images  they  handle.  This  is  especially  true  whenever  information  is  missing,  damaged,  or 
modified.  Armed  with  a  good  generic  image  prior,  restoration  algorithms  become  very  effective. 
The  research  activity  on  the  topic  of  image  priors  is  too  wide  to  be  compacted  into  this  paper,  and 
as  our  interest  is  primarily  in  learned  priors,  such  general  survey  is  beyond  its  scope  anyhow. 
We  therefore  restrict  our  discussion  to  recent  methods  that  lean  on  image  examples  in  their 
construction  of  the  prior. 

When  basing  the  learned  prior  on  image  examples,  the  first  junction  to  cross  is  the  one  which 
splits  between  parametric  and  non-parametric  prior  models.  The  parametric  path  suggests  an 
analytical  expression  for  the  prior,  and  directs  the  learning  process  to  tune  the  prior  parameters 
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based  on  examples.  Such  is  the  case  with  the  MRF  prior  learned  in  [24]  and  later  in  [25], 
[26];  the  wavelet  based  image  prior  as  appears  in  [27];  the  Tikhonov  regularization  proposed  by 
Haber  and  Tenorio  [28];  the  super-resolution  approach  adopted  by  Baker  and  Kanade  [29];  and 
the  recent  K-SVD  denoising  as  described  in  [3],  [2], 

The  alternative  path,  a  non-parametric  learning,  use  image  examples  directly  within  the  re¬ 
construction  process,  as  practiced  by  Efros  and  Leung  for  texture  synthesis  [30];  by  Freeman 
et.  al.  for  super-resolution  [31],  [32];  and  by  several  follow-up  works  [33],  [34],  [35],  [36]  for 
super-resolution  and  inpainting.  Interestingly,  most  of  the  above  direct  methods  avoid  the  prior 
and  target  instead  the  posterior  density,  from  which  reconstruction  is  easily  obtained. 

The  second  major  junction  to  cross  in  exploiting  examples  refers  to  the  question  of  the  origin 
of  these  examples.  Many  of  the  above  described  works  use  a  separate  corpus  of  training  images 
for  learning  the  prior  (or  its  parameters).  The  alternative  option  is  to  use  examples  from  the 
corrupted  image  itself.  This  surprising  idea  has  been  proposed  in  [37]  as  a  universal  denoiser  of 
images,  which  learns  the  posterior  from  the  given  image  in  a  way  inspired  by  the  Lempel-Ziv 
universal  compression  algorithm.  Another  path  of  such  works  is  the  Non-Local-Means  [38],  [39] 
and  related  works  [40],  [41],  Interestingly,  the  work  in  [2]  belongs  to  this  family  as  well,  as  the 
dictionary  can  be  based  on  the  noisy  image  itself. 

Most  of  the  above  methods  deploy  processing  of  small  image  patches,  a  theme  that  char¬ 
acterizes  our  method  as  well.  In  this  paper  we  present  a  framework  for  learning  a  sparsifying 
dictionary  for  color  image  patches  taken  from  natural  images.  As  such,  the  approach  we  adopt 
is  the  parametric  one.  The  learning  we  propose  leans  on  both  external  data-set,  as  well  as  on 
the  damaged  image  directly.  The  novelty  of  this  paper  is  in  the  way  of  introducing  the  color  to 
the  K-SVD  algorithm  such  that  it  avoids  typical  artifacts,  and  its  extension  to  non  homogeneous 
noise,  which  enables  the  handling  of  color  inpainting  and  demosaicing. 

The  only  work  we  are  aware  of,  which  handles  color  images  within  the  framework  of  para¬ 
metric  learned  models,  is  the  one  reported  in  [25].  This  work  builds  on  the  Field  of  Experts,  as 
developed  in  [26],  to  learn  an  MRF  model  for  color  images,  and  uses  it  successfully  for  image 
denoising.  The  main  theme  of  this  paper  is  an  attempt  to  circumvent  the  high-dimensionality  of 
the  training  space  by  several  simplifications  over  the  original  method  in  [26],  Color  artifacts  are 
not  mentioned,  and  indeed,  the  shown  results  demonstrate  a  phenomenon  we  have  experienced 
too,  of  a  tendency  to  wash-out  the  color  content  of  the  image.  We  address  this  in  this  paper 
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by  proposing  a  new  metric  in  the  sparsity  representation.  We  will  return  to  this  work  in  the 
experimental  results  section,  and  show  comparisons  of  performance  favorable  to  our  framework. 

III.  The  gray-scale  K-SVD  denoising  algorithm 

In  [3],  [2],  Aharon  and  Elad  present  a  K-SVD-based  algorithm  for  denoising  of  gray-scale 
images  with  additive  homogeneous  white  Gaussian  noise.  We  now  briefly  review  the  main 
mathematical  framework  of  this  approach,  as  our  work  builds  upon  it.  First,  let  x0  be  a  clean 
image  written  as  a  column  vector  of  length  N.  Then  one  considers  its  noisy  version, 

y  =  x0  +  w, 

where  w  is  a  white  Gaussian  noise  with  a  spatially  uniform  deviation  a,  which  is  assumed  to  be 
known.  Given  fixed-size  patches  yfn  x  yfn,  one  assumes  that  all  such  patches  in  the  clean  image 
x0  admit  a  sparse  representation.  Addressing  the  denoising  problem  as  a  sparse  decomposition 
technique  per  each  patch  leads  to  the  following  energy  minimization  problem: 

{<%D,x}  =  arg  min  A||x  -  y 1 1|  4  Y] //,./|  lnO'l  lo  +  I  lD^j  -  .  (1) 

hJ  ij 

In  this  equation,  x  is  the  estimator  of  x0,  and  the  dictionary  D  e  Rnxk  is  an  estimator  of  the 
optimal  dictionary  which  leads  to  the  sparsest  representation  of  the  patches  in  the  recovered 
image.  The  indices  [i,j]  mark  the  location  of  the  patch  in  the  image  (representing  it’s  top-left 
corner).  The  vectors  «,j  G  Mk  are  the  sparse  representations  for  the  [*,  j]-th  patch  in  x  using  the 
dictionary  D.  The  operator  RtJ  is  a  binary  n  x  N  matrix  which  extracts  the  square  yfn  x  yfn 
patch  of  coordinates  [i,j]  from  the  image  written  as  a  column  vector  of  size  N. 

The  first  term  in  Equation  (1)  introduces  the  likelihood  force  that  demands  a  proximity  between 
x  and  y.  The  second  and  the  third  terms  together  pose  the  image  prior.  This  regularization  term 
assumes  that  good-behaved  natural  images  are  to  exhibit  a  sparse  representation  for  every  patch, 
and  from  every  location  in  the  image,  over  the  learned  dictionary  D.  The  second  term  provides 
the  sparsest  representation,  and  the  third  term  ensures  the  consistency  of  the  decomposition. 
The  choice  of  the  norm  L2  is  relatively  arbitrary,  and  could  be  changed  in  this  formulation,  as 
later  proposed  in  this  paper  for  color  images.  Note  that  norms  different  than  L2  might  lead  to 
difficulties  in  the  minimization. 


6 


To  approximate  a  solution  for  this  complex  minimization  task,  the  authors  of  [3],  [2]  put 
forward  an  iterative  method  that  incorporates  the  K-SVD  algorithm,  as  presented  in  [1],  Figure 
1  presents  this  image  denoising  algorithm  in  details. 


Parameters:  A  (Lagrange  multiplier);  C  (noise  gain);  J  (the  number  of  iterations);  k 
(size  of  the  dictionary);  n  (size  of  the  patches  and  the  atoms). 

Initialization:  Set  x  =  y;  Let  D  =  (d/)/el.../,:  be  some  initial  dictionary. 

Loop:  Repeat  J  times 

•  Sparse  Coding:  Fix  D  and  use  OMP  to  compute  d,LJ  per  each  patch  by  solving 

Vij  &ij  =  argmin  ||a||o  subject  to  ||RjjX  —  Doj^  <  n(Ca)2 .  (2) 

a 

•  Dictionary  Update:  Fix  all  at],  and  for  each  atom  l  £  1,2 , ,k  in  D, 

-  Select  the  set  of  patches  which  use  this  atom,  Ul  =  {[hj]\&ij(l)  7^  0}. 

-  For  each  patch  [i,j]  £  ui,  compute  its  residual,  e*  •  =  R^x  —  Db-^  +d 

-  Set  E;  as  the  matrix  whose  columns  are  the  ek ,  and  a1  the  row  vector  whose 
elements  are  the 

-  Update  d;  and  the  d%](l)  by  minimizing: 

(di,al)  =  axg  min  ||Ej-do:||^.  (3) 

a,||d||2=l 

This  one-rank  approximation  is  performed  by  a  truncated  Singular  Value 
Decomposition  (SVD)  of  the  matrix  E;. 

Averaging:  Perform  a  weighted  average: 

A  =  (ai  +  Er5r«)"1(a>'  +  Er5d“«)-  <4) 

ij  ij 

Fig.  1.  The  K-SVD-based  image  denoising  algorithm  as  proposed  in  [2]. 


One  can  notice  that  different  steps  in  this  algorithm  reject  the  noise.  First,  the  Matching  Pursuit 
(OMP)  stops  when  the  approximation  reaches  a  sphere  of  radius  y/nCa  in  the  patches’  space 
in  order  not  to  reconstruct  the  noise.  Then,  the  SVD  selects  an  “average”  new  direction  for 
each  atom,  which  rejects  noise  from  the  dictionary.  Finally,  the  last  formula  performs  an  average 
between  the  representation  of  overlapping  patches. 
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Here,  the  choice  of  the  parameter  C  is  very  important  and  depends  on  the  dimension  of 
the  patches:  If  w'  is  a  n-dimensional  Gaussian  vector,  1 1 w'|  |2  is  distributed  by  the  generalized 
Rayleigh  law  [42]  which  leads  to  the  following  result: 

nC2 

P(||w'||2  <  VnCa)  =  — I  z^~1e~zdz. 

1  v  2/  Jz= 0 

In  [2],  C  was  tuned  empirically  to  C  =  1.15  for  n  =  8  x  8  =  64.  Here  we  choose  the  rule 

P(||w'||2  <  VnCa)  =  0.93,  (5) 

which  provides  a  good  parameter  C  for  any  dimension  n. 

This  approach  leads  to  state-of-the-art  gray-scale  image  denoising  performance  as  shown  in 
[3],  [2],  The  main  challenge  to  extend  this  work  to  color  images  is  to  make  it  able  to  capture 
some  correlation  between  the  channels  and  to  reconstruct  the  image  without  adding  artifacts.  Both 
simple  methods,  such  as  denoising  each  channel  separately  and  denoising  directly  each  patch 
as  a  long  concatenated  RGB  vector,  fail  respectively  in  one  of  these  two  challenges.  Moreover, 
for  classical  color  image  processing  applications  such  as  demosaicing,  denoising,  and  image 
inpainting,  [43],  spatial  and/or  spectral  non-uniform  noise  has  to  be  handled.  These  challenges 
are  addressed  next. 


IV.  Sparse  color  image  representation 

We  now  turn  to  detail  the  proposed  fundamental  extensions  to  the  above  gray-scale  framework, 
and  put  forward  algorithms  addressing  several  different  tasks,  such  as  denoising  of  color  images, 
denoising  with  non-uniform  noise,  inpainting  small  holes  of  color  images,  and  demosaicing. 

A.  Denoising  of  color  images 

The  simplest  way  to  extend  the  K-SVD  algorithm  to  the  denoising  of  color  images  is  to  denoise 
each  single  channel  using  a  separate  algorithm  with  possibly  different  dictionaries.  However, 
our  goal  is  to  take  advantage  of  the  learning  capabilities  of  the  K-SVD  algorithm  to  capture  the 
correlation  between  the  different  color  channels.  We  will  show  in  our  experimental  results  that 
a  joint  method  outperforms  this  trivial  plane-by-plane  denoising.  Recall  that  although  in  this 
paper  we  concentrate  on  color  images,  the  key  extension  components  here  introduced  are  valid 


for  other  modalities  of  vector-valued  images,  where  the  correlation  between  the  planes  might  be 
even  stronger. 

The  problem  we  address  is  the  denoising  of  RGB  color  images,  represented  by  a  column 
vector  y,  contaminated  by  some  white  Gaussian  noise  w  with  a  known  deviation  a,  which 
has  been  added  to  each  channel.  (As  we  show  below,  the  noise  does  not  have  to  be  uniform 
across  the  channels  or  across  the  image.)  Color-spaces  such  as  YCbCr,  Lab,  and  other  Lumi¬ 
nance/Chrominance  separations  are  often  used  in  image  denoising  because  it  is  natural  to  handle 
the  chroma  and  luma  layers  differently,  and  also  because  the  L2-norm  in  these  spaces  is  more 
reliable  and  better  reflects  the  human  visual  system’s  perception.  However,  in  this  work  we 
choose  to  stay  with  the  original  RGB  space,  as  any  color  conversion  changes  the  structure  of 
the  noise.  Since  the  OMP  step  in  the  algorithm  uses  an  intrinsic  hypothesis  of  a  noise  with  a 
sphere  structure  in  the  patches’  space,  such  assumption  remains  valid  only  in  the  RGB  domain. 
Other  noise  geometries  do  not  necessarily  guarantee  the  performance  of  the  OMP. 

In  order  to  keep  a  reasonable  computational  complexity  for  the  color  extensions  presented  in 
this  work,  we  use  dictionaries  that  are  not  particularly  larger  than  those  practiced  in  the  gray-scale 
version  of  the  algorithm.  More  specifically,  in  [2]  the  authors  use  dictionaries  with  256  atoms 
and  patches  of  size  8x8.  Applying  directly  the  K-SVD  algorithm  on  (three  dimensional)  patches 
of  size  8x8x3  (containing  the  RGB  layers)  with  256  atoms,  leads  already  to  substantially 
better  results  than  denoising  each  channel  separately.  However,  this  direct  approach  produces 
artifacts  -  especially  a  tendency  to  reduce  the  color  saturation  in  the  reconstruction.  We  observe 
that  during  the  algorithm,  the  OMP  is  likely  to  follow  the  “gray”  axis,  which  is  the  axis  defined 
by  r  =  g  =  b  in  the  RGB  color  space. 

Before  proceeding  to  explain  the  proposed  solution  to  this  color  bias  and  washing  effect,  let 
us  explain  why  it  happens.  As  mentioned  before,  this  effect  can  be  seen  in  the  results  in  [25], 
although  it  has  not  been  explicitly  addressed  there. 

First,  at  the  intuitive  level,  relatively  small  dictionaries  within  the  order  of  256  atoms,  for 
example,  are  not  rich  enough  to  represent  the  diversity  of  colors  in  natural  images.  Therefore, 
training  a  dictionary  on  a  generic  database  leads  to  many  gray  or  low  chrominance  atoms  which 
represent  the  basic  spatial  structures  of  the  images.  This  behavior  can  be  observed  in  Figure 
2.  This  result  is  not  unexpected  since  this  global  dictionary  is  aiming  at  being  generic.  This 
predominance  of  gray  atoms  in  the  dictionary  encourages  the  image  patches  approximation  to 
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“follow”  the  gray  axis  by  picking  some  gray  atoms  (via  the  OMP  step,  see  below),  and  this 
introduces  a  bias  and  color  washing  in  the  reconstruction.  Examples  of  such  color  artifacts 
resulting  from  global  dictionaries  are  presented  in  Figure  3.  Using  an  adaptive  dictionary  tends 
to  reduce  but  not  eliminate  these  artifacts.  A  look  at  the  dictionary  in  Figure  4,  learned  on  a 
specific  image  instead  of  a  database  (global  dictionary),  shows  that  the  atoms  are  usually  more 
colored.  Our  experiments  showed  that  these  color  artifacts  are  still  present  on  some  image  details 
even  with  adaptive  dictionaries.  One  might  be  tempted  to  solve  the  above  problem  by  increasing 
k  and  thus  adding  redundancy  to  the  dictionary.  This,  however,  is  counter  productive  in  two 
important  ways  -  the  obtained  algorithm  becomes  computationally  more  demanding,  and  as  the 
images  we  handle  are  getting  close  in  size  to  the  dictionary,  over-fitting  in  the  learning  process 
in  unavoidable. 
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(a)  5  x  5  x  3  patches 


Fig.  2.  Dictionaries  with  256  atoms  learned  on  a  generic  database  of  natural  images,  with  two  different  sizes  of  patches.  Note 
the  large  number  of  color-less  atoms.  Since  the  atoms  can  have  negative  values,  the  vectors  are  presented  scaled  and  shifted  to 
the  [0,  255]  range  per  channel. 


We  address  this  color  problem  by  changing  the  metric  of  the  OMP  as  it  will  be  explained 
shortly.  The  OMP  is  a  greedy  algorithm  that  aims  to  approximate  a  solution  of  Equation  (2).  It 
consists  of  selecting  at  each  iteration  the  best  atom  from  the  dictionary,  which  is  the  one  that 
maximizes  its  inner  product  with  the  residual  (minimizing  the  error  metric),  and  then  updating 
the  residual  by  performing  an  orthogonal  projection  of  the  signal  one  wants  to  approximate 
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(a)  Original 


(b)  Original  algorithm,  7  =  0,  (c)  Proposed  algorithm,  7  = 

PSNR=28.78  dB  5.25,  PSNR=30.04  dB 


Fig.  3.  Examples  of  color  artifacts  while  reconstructing  a  damaged  version  of  the  image  3(a)  without  the  improvement  here 
proposed  (7  =  0  in  the  new  metric).  Color  artifacts  are  reduced  with  our  proposed  technique  (7  =  5.25  in  our  proposed  new 
metric).  Both  images  have  been  denoised  with  the  same  global  dictionary.  In  3(b),  one  observes  a  bias  effect  in  the  color  from 
the  castle  and  in  some  part  of  the  water.  What  is  more,  the  color  of  the  sky  is  piecewise  constant  when  7  =  0  (false  contours), 
which  is  another  artifact  our  approach  corrected. 


onto  the  vectorial  space  generated  by  the  previously  selected  atoms.  This  orthogonalization  is 
important  since  it  gives  more  stability  and  a  faster  convergence  for  this  greedy  algorithm.  For 
details,  the  reader  should  refer  to  [23],  [22],  [21]. 

An  additional,  more  formal  way  to  explain  the  lack  of  colors  and  the  color  bias  in  the 
reconstruction  is  to  note  that  the  OMP  does  not  guarantee  that  the  reconstructed  patch  will 
maintain  the  average  color  of  the  original  one.  Therefore,  the  following  relationship  for  the 
patch  ij,  E(Rjj-y)  =  E(R,-;xo  +  Rt/w)  =  E(Rv-x),  does  not  necessarily  hold.  If  the  diversity 
of  colors  is  not  important  enough  in  the  dictionary,  the  pursuit  is  likely  to  follow  some  other 
direction  in  the  patches’  space.  But  with  color  images  and  the  corresponding  increase  in  the 
dimensionality,  our  experiments  show  that  this  is  clearly  the  case.  To  address  this,  we  add  a 
force  during  the  OMP  which  tends  to  minimize  also  the  bias  between  the  input  image  and  the 
reconstruction  on  each  channel.  Considering  that  y  and  x  are  two  patches  written  as  column 
vectors  (R,G,  B)T ,  we  define  a  new  inner  product  to  be  used  in  the  OMP  step: 


<  y,  x  >7=  yTx  +  ^yTKTKx  =  yT(I  +  ^K)x, 
nz  n 


(6) 
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(a)  Training  Image 


(b)  Resulting  dictionary 


Fig.  4.  Figure  4(b)  is  the  dictionary  learned  on  the  image  4(a).  The  dictionary  is  more  colored  than  the  global  one. 


where 
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Here,  Jn  is  a  n  x  n  matrix  filled  with  ones,  and  7  is  a  new  parameter  which  can  be  tuned 
to  increase  or  discard  this  correction.  We  empirically  fixed  this  parameter  to  7  =  5.25.  The 
first  term  in  this  equation  is  the  ordinary  Euclidean  inner  product.  The  second  term  with  the 
matrix  K  computes  an  estimator  of  E(y)  and  E(x)  on  each  channel  and  multiplies  them,  thereby 
forcing  the  selected  atoms  to  take  into  account  the  average  colors.  Examples  of  results  from  our 
algorithm  are  presented  on  Figure  3,  with  different  values  for  this  parameter  7,  illustrating  the 
efficiency  of  our  approach  in  reducing  the  color  artifacts.  This  correction  proved  to  be  crucial 
in  our  process,  especially  for  global  dictionaries  which  have  a  lot  of  gray  atoms  as  mentioned 
above. 

A  simple  implementation  of  the  above  ideas  follows  from  the  simple  fact  that  I  +  = 

(I  +  ^K)T(I  +  7K),  with  7  =  2 a  +  a2.  Thereby,  scaling  each  patch  and  each  atom  of  the 
dictionary  by  multiplying  them  by  I  +  ^K,  and  taking  into  account  a  normalization  factor,  leads 
to  a  straightforward  implementation  of  the  OMP  with  the  new  inner  product.  This  approach 
effectively  changes  the  metric  in  Equation  (2).  We  chose  not  to  change  the  stopping  criterion 
in  the  OMP  to  prevent  any  blurring  effect  due  to  the  increase  in  low  frequency  caused  by  this 


scaling.  One  could  wonder  why  we  chose  not  to  change  the  metric  as  well  in  equations  (3)  and 
(4),  and  thereby  in  Equation  (1)  (this  could  effectively  be  achieved  by  keeping  the  scaled  image 
and  dictionaries  all  across  the  algorithm).  In  fact,  this  metric  (inner  product)  modification  is  here 
simply  to  correct  a  defect  of  the  OMP  when  a  dictionary  can  not  provide  enough  diversity  in 
the  choice  of  colors  and  does  not  aim  at  changing  the  global  formulation. 

To  conclude,  the  basic  color  denoising  algorithm  follows  the  original  K-SVD,  applied  to 
\JTi  x  y/n  x  3  patches,  with  a  new  metric  in  the  OMP  step  that  explicitly  addresses  critical  color 
artifacts. 

B.  Extension  to  non-homo geneous  noise 

We  now  extend  the  K-SVD  algorithm  to  non-uniform  noise.  This  problem  is  very  important 
as  non-uniform  noise  across  color  channels  is  very  common  in  digital  cameras.  Spatially  non- 
uniform  noise  also  becomes  very  important  for  color  demosaicing  and  inpainting. 

To  simplify  the  presentation,  we  first  consider  the  case  of  gray-scale  images.  We  denote  by 
op  >  0  the  deviation  of  the  noise  at  the  pixel  p.  We  assume  that  this  vector  o  is  known  (this 
assumption  is  natural  for  demosaicing,  inpainting,  and  color  dependent  noise).  In  order  to  be 
able  to  use  a  consistent  OMP,  we  need  to  have  a  sphere  structure  for  the  noise  in  the  patches’ 
space.  This  can  be  explained  by  the  fact  that  this  greedy  algorithm  aims  at  finding  the  sparsest 
path  from  the  null  vector  to  the  vector  to  approximate  in  the  space  generated  by  the  dictionary, 
by  selecting  iteratively  atoms  and  reducing  the  norm  of  the  residual.  To  prevent  the  algorithm 
to  retrieve  the  noise,  one  has  to  stop  the  pursuit  when  it  reaches  a  high  probability  of  finding 
the  value  of  the  non-noisy  patch.  As  the  only  aim  of  the  algorithm  is  to  reduce  the  norm  of  the 
residual,  the  algorithm  will  provide  a  maximum  efficiency  if  the  stopping  criterion  is  based  on  a 
threshold  for  this  norm,  thereby  it  imposes  a  sphere  structure  for  the  noise  in  the  norm’s  metric. 

The  first  natural  idea  would  be  to  scale  the  data  so  that  the  deviation  of  the  noise  becomes 
uniform.  Scaling  the  data  and  then  approximating  the  scaled  models  leads  to  loose  the  image 
natural  structure,  which  is  exactly  what  we  are  trying  to  learn  and  exploit.  Therefore,  we  need  to 
approximate  the  non-scaled  data  using  a  different  metric  for  each  patch  where  the  noise  would 
have  a  sphere  structure.  Introducing  a  vector  (3  composed  of  weights  for  each  pixel: 
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It  leads  us  to  define  a  weighted  K-SVD  algorithm  based  on  a  different  metric  for  each  patch. 
Denoting  by  0  the  element-wise  multiplication  between  two  vectors,  we  aim  at  solving  the 
following  problem,  which  replaces  Equation  (1): 

{c%,D,x}  =  arg  min  A||/?®(x-y)|||+^//idladlo+5^  ||(Ri^)®(Da'ii-Rijx)||i  •  (7) 

-U  ,  OL  xj*)  X 

ij  ij 

The  OMP  step  then  aims  at  solving  the  following  equation  for  the  patch  ij,  instead  of  (2): 

Vij,  &ij  =  arg  min  ||a||0  subject  to  ||(R,  ;T)  0  (Ryx  —  Dev) 1 1|  <  j]Rii/?||0(C'min(jp)2.  (8) 

a  p 

The  term  |  R(J/3  |0  here  counts  the  number  of  pixels  in  the  patch  ij  without  a  coefficient  (3  equal 
to  zero  which  should  therefore  be  taken  into  account.  The  new  inner  product  (metric)  associated 
with  our  problem  for  any  vector  x  and  y  becomes: 

<  x,  y  >i:j  =  ((R ijP)  0  x)t((R ij/3)  0  y). 

Concerning  the  learning  step,  the  natural  approach  consists  of  minimizing  for  each  atom  l  the 
following  energy,  instead  of  Equation  (3): 

(d i,  a1)  =  arg  min  1 1  fa  0  (Ej  -  da)  | \%,  (9) 

a,||d||2=l 

where  fa  is  the  matrix  whose  size  is  the  same  as  E;  and  where  each  column  corresponding  to 
an  index  [i,j]  is  Riy/3.  This  problem  is  known  as  a  weighted  one-rank  approximation  matrix,  is 
not  simple  and  has  not  an  unique  solution.  In  [44],  Srebro  and  Jaakkola  put  forward  a  simple 
iterative  algorithm  which  gives  an  approximated  solution  of  a  local  minimum.  Nevertheless,  this 
algorithm  requires  a  SVD  for  each  of  its  iterations,  being  relatively  complex.  We  chose  instead 
for  each  atom  l  to  perform  a  one-rank  approximation  with  a  single  truncated  SVD  of  the  most 
consistent  term,  which  is  composed  of  the  contribution  of  one  atom  in  the  reconstruction  plus 
its  weighted  residual: 

(di,Qi/)  =  arg  min  ||Ef  —  do:||fi,  (10) 

«.lldl|2=l 

where  Ef  is  the  matrix  whose  columns  are 

ef  fa~)  =  (Rijff)  0  (R ijX  -  t>aij)  +  (11) 

weighted  residual  contribution 

Note  that  we  can  rewrite  this  expression  as 


ei(ij)  =  (RijP)  ®  (R ijX  -  D«y  +  d i&ifal))  +  (In  -  R ij/3)  0  d i&ifal) 
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where  ln  denotes  a  column  vector  filled  with  ones.  This  is  actually  the  first  iteration  of  the 
algorithm  put  forward  by  Srebro  and  Jaakkola  in  [44],  One  can  interpret  the  first  term  (R ^(3)  0 
(RyX  —  D a.ij  +  d as  the  relevant  information  we  dispose  to  make  the  atoms  evolve, 
whereas  (ln  —  R(;/3)  0  d[al:](!)  can  be  viewed  as  a  balancing  term,  completing  the  equation  and 
permitting  to  perform  the  SVD. 

Finally,  note  that  the  averaging  expression,  Equation  (4),  remains  the  same  (modulo  the  I  and 
y  in  the  average,  which  are  weighted  by  (3),  since  all  has  been  taken  into  consideration  in  the 
previous  stages  of  the  algorithm. 

Let  us  now  present  the  model  that  combines  this  non-uniform  noise  handling  with  the  one 
developed  to  deal  with  color  artifacts  introduced  in  the  previous  section.  Mixing  these  two  new 
metrics  for  each  patch  ij,  we  use  then  the  following  new  inner  product  (metric)  during  the  OMP 
step  only: 

<  x,  y  >ij  =  ((R ijj3)  0  x)T(J  +  ^K)((R tj0)  0  y) 

Concerning  the  learning  step,  it  remains  identical  to  Equation  (9),  since  it  does  not  need  any 
color/bias  correction. 

To  conclude,  we  have  now  introduced  a  new  metric  that  addresses  possible  color  artifacts  as 
well  as  non-uniform  noise.  This  paves  the  way  for  applications  beyond  color  image  denoising, 
and  these  are  described  next. 

C.  Color  image  inpainting 

Image  inpainting  is  the  art  of  modifying  an  image  in  an  undetectable  form,  and  it  often  refers 
to  the  filling-in  of  holes  of  missing  information  in  the  image  [43].  Although  sparsity,  as  practiced 
here  with  localized  patches,  is  not  necessarily  an  efficient  model  for  filling  large  holes,  since  it 
would  intrinsically  lead  to  a  lack  of  details,  one  can  still  use  it  for  filling  small  holes,  as  long 
as  their  sizes  are  smaller  than  the  size  of  the  atoms.  For  larger  holes,  iterative  and/or  multiscale 
or  texture  synthesis  methods  like  in  [45]  are  needed. 

The  idea  for  extending  the  previously  described  work  for  inpainting  is  quite  simple.  If  one 
considers  holes  as  areas  with  infinite  power  noise,  this  leads  to  some  coefficients  equal  to 
0.  To  make  the  model  consistent  we  also  fixed  a  =  e  >  0  in  the  known  areas  to  prevent  infinite 
/3-coefficients.  Finally,  we  consider  two  possibilities: 
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•  If  we  have  both  noise  and  missing  information  (holes),  we  use  the  model  exactly  as  described 
above  for  color  image  restoration  with  non-uniform  noise. 

•  If  we  only  have  missing  information,  we  change  the  OMP  so  that  it  runs  for  not  more 
than  a  fixed  number  of  iterations  (this  replaces  the  error-based  stopping  criteria).  Then  we 
use  the  information  of  the  reconstructed  image  to  fill-in  the  holes.  This  approach  is  faster 
because  it  does  not  force  the  OMP  to  fit  exactly  the  known  areas  and  gives  similar  visual 
results. 

Note  also  that  within  the  limit  of  our  model,  some  /^-coefficients  equal  to  zero  can  mask  parts 
of  some  unnatural  atoms,  which  could  end-up  being  used.  Therefore,  we  initialize  the  algorithm 
with  a  global  dictionary  learned  on  a  large  dataset  of  clean  and  hole-free  images,  preventing 
the  use  of  non  natural  patterns  like  the  atoms  in  the  3D-DCT  dictionary,  which  proved  to  be 
inefficient  in  our  experiments  for  inpainting. 

Another  possible  problem  can  occur  when  the  matrix  of  the  coefficients  (3  follows  a  regular 
pattern.  This  can  lead  our  algorithm  to  learn  this  pattern,  absorb  it  into  the  dictionary,  and  thus 
overfit.  A  successful  strategy  for  preventing  this  is  presented  in  the  demosaicing  section  next, 
where  the  holes  do  form  a  repetitive  pattern  (this  is  much  more  unusual  in  ordinary  inpainting 
applications).  Note  also  that  with  inpainting,  we  do  not  have  the  problem  of  color  artifacts 
anymore  because  the  constraints  of  reconstruction  are  hard  (meaning  they  tend  to  have  a  perfect 
reconstruction  on  some  parts  of  the  image),  thereby  preventing  any  bias  problem.  That  is  why 
we  chose  7  =  0  in  this  case. 

To  conclude,  the  model  for  non-uniform  noise  can  be  readily  exploited  for  color  image 
inpainting,  and  examples  on  this  are  presented  in  the  experimental  section. 

D.  Color  image  demosaicing 

The  problem  of  color  demosaicing  consists  of  reconstructing  a  full  resolution  image  from 
the  raw  data  produced  by  a  common  colored-filtered  sensor.  Most  digital  cameras  use  CCD  or 
CMOS  sensors,  which  are  composed  of  a  grid  of  sensors.  One  sensor  is  associated  to  one  pixel 
and  is  able  to  measure  the  light  energy  it  receives  during  a  short  time.  Combined  with  a  color 
filter  ( R  (red),  G  (green),  or  B  (blue)),  it  retrieves  the  color  information  of  one  specific  channel. 
Therefore,  often  only  one  color  for  each  pixel  is  obtained  and  interpolation  of  the  the  missing 
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values  is  necessary.  The  most  used  pattern  for  this  is  the  Bayer  pattern,  GRGRGR. . .  on  odd 
lines  and  BGBGBG. . .  on  even  ones. 

Several  algorithms  have  been  developed  in  recent  years  to  produce  high  quality  full  color 
images  from  the  mosaic  sensor,  e.g.,  [46],  [47],  [48],  [49].  Although  color  demosaicing  is  be¬ 
coming  less  relevant  with  the  on-going  development  of  sensor  and  camera  technology,  addressing 
it  remains  a  challenging  task  that  helps  to  test  the  effectiveness  of  different  image  models  and 
image  processing  algorithms.  We  thereby  chose  to  address  this  problem  as  a  proof  of  the  relevance 
of  our  model,  helping  to  show  the  generality  of  our  framework.  The  fact  that  our  general  model 
performs  as  well  or  even  better  than  state-of-the-art  algorithms  exclusively  developed  for  image 
demosaicing,  shows  the  correctness  of  our  approach,  and  the  generality  of  the  sparsity  and 
redundancy  concepts,  along  with  the  appropriateness  of  the  K-SVD  algorithm  for  learning  the 
dictionary. 

We  opt  to  define  the  problem  of  demosaicing  as  an  inpainting  problem  with  very  small  holes 
which  consist  of  two  missing  channels  per  pixel.  Considering  that  we  do  not  need  any  smoothing 
effects  to  get  rid  of  some  noise  (assuming  for  the  sake  of  simplicity  that  the  available  colors 
are  noise  free),  neither  we  need  to  inpaint  large  holes,  we  chose  to  restrain  the  algorithm  to  use 
small  patches  well  adapted  for  retrieving  details.  We  chose  thus  5x5x3  patches. 

One  drawback  of  our  modified  K-SVD  algorithm  with  f3  coefficients  is  that  the  presence  of 
a  (hole)  pattern  in  the  matrix  (3  can  lead  the  algorithm  to  learn  it  (the  pattern  would  appear 
in  the  dictionary  atoms).  A  simple  way  of  addressing  this  problem  is  to  use  a  globally  learned 
dictionary  with  no-mosaic  images.  This  gives  already  quite  good  results,  and  as  we  shall  show 
next,  can  be  further  improved. 

We  introduce  the  idea  of  learning  the  dictionary  on  a  likely  image  with  some  artifacts  but 
with  a  low  number  of  steps  during  the  OMP  (low  number  of  atoms  will  be  used  to  represent 
the  patch),  in  order  to  prevent  any  learning  of  these  artifacts  (over-fitting).  We  define  then  the 
patch-sparsity  L  of  the  decomposition  as  this  number  of  steps.  The  stopping  criteria  in  Equation 
(2)  becomes  the  number  of  atoms  used  instead  of  the  reconstruction  error.  Using  a  small  L 
during  the  OMP  permits  to  learn  a  dictionary  specialized  in  providing  a  coarse  approximation. 
Our  assumption  is  that  (pattern)  artifacts  are  less  present  in  coarse  approximations,  preventing 
the  dictionary  from  learning  them.  We  propose  then  the  algorithm  described  in  Figure  5.  We 
typically  used  L  =  2  to  prevent  the  learning  of  artifacts  and  found  out  that  two  outer  iterations 
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in  the  scheme  in  Figure  5  are  sufficient  to  give  satisfactory  results,  while  within  the  K-SVD, 
10-20  iterations  are  required. 


Input:  ym  (mosaiced  image);  Ds  (one  pre-learned  global  dictionary). 

Demosaic  with  a  global  dictionary:  ym  using  D(/.  This  gives  x. 

Demosaicing  improvement:  Apply  Iter  times: 

•  Dictionary  Learning:  Learn  the  dictionary  on  x  using  a  low  L  patch- sparsity  factor 
starting  from  D9.  This  gives  an  adaptive  dictionary  Da. 

•  Demosaicing  using  joint  dictionary:  Demosaic  ym  (with  the  new  inner  product, 
derived  from  non-uniform  noise  considerations,  as  defined  in  the  previous  section) 
using  the  joint  dictionary  Da  U  Dr  This  gives  an  update  of  x. 

Return  x 

Fig.  5.  Modified  K-SVD  algorithm  for  color  image  demosaicing.  This  algorithm  combine  global  dictionaries  with  data  dependent 
ones  learned  with  a  low  patch- sparsity  factor. 


To  conclude,  in  order  to  address  the  demosaicing  problem,  we  use  the  modified  K-SVD 
algorithm  that  deals  with  non-uniform  noise,  as  described  in  previous  section,  and  add  to  it  an 
adaptive  dictionary  that  has  been  learned  with  low  patch-sparsity  in  order  to  avoid  over-fitting  the 
mosaic  pattern.  The  same  technique  can  be  applied  to  generic  color  inpainting  as  demonstrated 
in  the  next  section. 


V.  Experimental  results 

We  are  now  ready  to  present  the  color  image  denoising,  inpainting,  and  demosaicing  results 
that  are  obtained  with  the  proposed  framework. 

A.  Denoising  color  images 

The  state-of-the-art  performance  of  the  algorithm  on  gray-scale  images  has  already  been 
studied  in  [2].  We  now  evaluate  our  extension  for  color  images.  We  trained  some  dictionaries 
with  different  sizes  of  atoms  5x5x3,  6x6x3,  7x7x3  and  8  x  8  x  3,  on  200000  patches 
taken  from  a  database  of  15000  images  with  the  patch- sparsity  parameter  L  =  6  (6  atoms  in  the 
representations).  We  used  the  database  LabelMe,  [50],  to  build  our  image  database.  Then  we 
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trained  each  dictionary  with  600  iterations.  This  provided  us  a  set  of  generic  dictionaries  that  we 
used  as  initial  dictionaries  in  our  denoising  algorithm.  Comparing  the  results  obtained  with  the 
global  approach  and  the  adaptive  one  permits  to  see  the  improvements  in  the  learning  process. 
We  chose  to  evaluate  our  algorithm  on  some  images  from  the  Berkeley  Segmentation  database, 
[51],  presented  in  Figure  6.  This  data  selection  allows  us  to  compare  our  results  with  the  relevant 
work  on  color  image  denoising  reported  in  [25],  which  as  mentioned  before,  is  an  extension  of 
[26].  As  the  raw  performance  of  the  algorithm  can  vary  with  the  different  noise  realization,  the 
results  presented  in  Table  I  and  Table  II  are  averaged  over  5  experiments  for  each  image  and  each 
a.  Note  that  the  parameter  C  has  been  tuned  using  Equation  (5).  Some  visual  results  are  also 
presented  on  Figure  7  on  the  “castle”  image.  First  of  all,  one  can  observe  that  working  on  the 
whole  RGB  space  provides  an  important  improvement  when  compared  to  applying  gray-scale 
K-SVD,  [2],  on  each  channel  separately,  both  in  terms  of  PSNR  and  visually.  Concerning  the 
different  sizes  of  patches,  small  patches  are  better  at  retrieving  the  color  of  some  details  whereas 
large  patches  are  better  in  flat  areas.  For  example,  taking  the  example  of  Figure  7,  the  sky  on 
the  10  x  10  x  3  image  is  smoother  than  on  the  5x5x3,  whereas  the  color  of  the  red  curtain 
behind  the  windows  of  the  center  tower  is  slightly  washed  out  on  the  10  x  10  x  3  image.  This 
motivates  in  part  the  learning  of  multiscale  dictionaries  as  discussed  in  the  conclusion  of  this 
paper.  Another  visual  result  is  presented  Figure  8  on  the  “mushroom”  image.  Besides  this,  we 
present  an  example  of  denoising  a  non  uniform  noise  on  Figure  9,  where  a  white  Gaussian  noise 
has  been  added  to  each  data,  but  with  a  different  known  standard  deviation. 


Fig.  6.  Data  set  used  for  evaluating  denoising  experiments 


The  results  show  that  our  approach  is  well  adapted  to  color  images.  The  quality  of  the  results 
obtained  by  applying  the  extended  color  K-SYD  algorithm  to  the  RGB  space  are  significantly 
better  than  when  denoising  each  RGB  channel  separately.  Moreover,  the  algorithm  outperforms 
the  most  recent  work  on  learning  color  images  reported  in  [25]. 

The  K-SYD  algorithm  is  relatively  fast  for  denoising.  The  complexity  depends  on  the  number 
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Fig.  7.  Results  obtained  on  the  castle  image.  Bottom  left  image  is  the  noisy  image  with  a  —  25,  top  left  image  is  the  original 
one.  Then,  from  left  to  right  are  respectively  presented:  the  results  obtained  with  patches  5x5x3,  7x7x3,  10x10x3, 
and  the  result  obtained  applying  directly  the  grayscale  K-SVD  algorithm,  [2],  on  each  channel  separately  using  patches  8x8. 
The  bottom  row  shows  a  zoomed-in  region  for  each  one  of  the  top  row  images. 


Fig.  8.  Result  obtained  by  applying  our  algorithm  with  7x7x3  patches  on  the  mushroom  image  where  a  white  Gaussian 
noise  of  standard  deviation  a  —  25  has  been  added. 


(a)  Original 


(b)  Noisy 


(c)  Denoised  Image 
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cr/ PSNR 

castle 

mushroom 

train 

horses 

kangaroo 

Average 

5/34.15 

39.90 

37.96 

38.81 

37.51 

38.90 

35.92 

39.25 

36.96 

38.30 

37.51 

39.03 

37.17 

39.57 

40.37 

39.61 

39.93 

37.43 

39.76 

39.52 

40.09 

38.54 

39.00 

38.93 

39.83 

10/28.13 

35.50 

33.93 

34.16 

33.25 

33.63 

31.04 

34.45 

32.50 

33.14 

30.97 

34.18 

32.24 

35.77 

36.24 

35.49 

35.60 

33.68 

34.72 

35.14 

35.43 

33.87 

34.06 

34.79 

35.21 

15/24.61 

33.13 

31.70 

31.76 

30.97 

30.50 

28.31 

31.90 

30.24 

30.40 

28.78 

31.54 

30.00 

33.4 

33.98 

33.10 

33.18 

30.65 

31.70 

32.53 

32.76 

31.15 

31.30 

32.17 

32.58 

25/20.17 

29.98 

28.99 

28.73 

28.37 

26.72 

25.32 

28.84 

27.87 

27.48 

26.69 

28.35 

27.45 

30.77 

31.19 

30.21 

30.26 

27.69 

28.16 

29.69 

29.81 

28.31 

28.39 

29.33 

29.56 

TABLE  I 

PSNR  results  of  our  denoising  algorithm  with  256  atoms  of  size  7  x  7  x  3  for  a  >  10  and  6  x  6  x  3  for  a  <  10.  Each  case  is 
divided  in  four  parts:  The  top-left  results  are  those  given  hy  McAuley  and  al  [25]  with  their  “3  x  3  model  ”  The  top-right 
results  are  those  obtained  by  applying  the  grayscale  K-SVD  algorithm,  [2],  on  each  channel  separately  with  8x8  atoms.  The 
bottom-left  are  our  results  obtained  with  a  globally  trained  dictionary.  The  bottom-right  are  the  improvements  obtained  with 
the  adaptive  approach  with  20  iterations.  Bold  indicates  the  best  results  for  each  group.  As  can  be  seen,  our  proposed 

technique  consistently  produces  the  best  results. 


of  patches,  the  sparsity  of  the  decomposition  (which  depends  on  a  as  well),  and  the  size  of  the 
patches.  With  our  experimental  implementation  in  C++,  it  took  approximatively  4.4s  to  remove 
noise  with  standard  deviation  a  =  25  from  a  256  x  256  x  3  color  image  with  patches  of  size 
5x5x3  with  a  global  dictionary  (one  iteration),  and  about  1  min.  40  sec.  when  we  performed 
20  iterations  of  the  algorithm  on  a  Pentium-M  1.73GHz  processor. 

B.  Inpainting  color  images 

We  now  present  results  for  inpainting  small  holes  in  images.  The  first  example  shows  the 
behavior  of  our  algorithm  when  removing  data  from  the  castle  image.  It  is  presented  on  Figure 
10.  The  second  example,  Figure  11,  is  a  classical  example  of  text  removal,  [43],  and  was  used  in 
[26]  in  order  to  evaluate  their  model  compared  to  the  pioneer  work  from  [43].  In  [26],  the  Field 
of  Experts  model  achieves  32.23  dB  using  their  algorithm  on  the  YCbCr  space  and  32.39  dB  on 
the  RGB  space.  With  our  model,  using  9x9x3  patches  which  are  large  enough  to  fill  in  the 
holes  and  a  dictionary  with  512  atoms  to  ensure  the  over-completeness,  a  patch-sparsity  of  20, 
and  20  learning  iterations,  we  obtained  32.45  dB.  The  results  from  these  two  models  are  very 
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cr/PSNR 

[25] 

3x3 

[25] 

5x5 

Global 

5x5x3 

Adaptive 

5x5x3 

Global 

6x6x3 

Adaptive 

6x6x3 

Global 

7x7x3 

Adaptive 

7x7x3 

5/34.15 

39.90 

40.26 

39.94 

40.16 

39.57 

40.37 

38.20 

40.13 

10/28.13 

35.50 

35.91 

35.71 

36.05 

35.77 

36.24 

35.33 

36.25 

15/24.61 

33.13 

33.49 

33.27 

33.70 

33.53 

33.93 

33.40 

33.98 

25/20.17 

30.00 

30.41 

30.22 

30.80 

30.66 

31.05 

30.77 

31.19 

TABLE  II 

Comparison  of  the  PSNR  results  on  the  image  “ castle  ”  between  [25]  and  what  we  obtained  with  256  6  x  6  x  3  awJ  7x7x3 
patches.  For  the  adaptive  approach,  20  iterations  have  been  performed.  Bold  indicates  the  best  result,  indicating  once  again 

the  consistent  improvement  obtained  with  our  proposed  technique. 


(a)  Original  (b)  Noisy  (c)  Denoised  Image 

Fig.  9.  Example  of  non- spatially -uniform  white  Gaussian  noise.  For  each  image  pixel,  a  takes  a  random  value  from  a  uniform 
distribution  in  [1;  101].  The  a  values  are  assumed  to  be  known  by  the  algorithm.  Here,  the  denoising  has  been  performed  with 
7x7x3  patches  and  20  learning  iterations.  The  initial  PSNR  was  12.77  dB,  the  resulting  PSNR  is  29.87  dB. 


similar  and  both  achieve  better  results  than  those  presented  in  the  original  inpainting  algorithm 
[43], 

C.  Demosaicing 

We  now  present  results  for  demosaicing  images.  We  ran  our  experiments  on  the  images  in 
Figure  12,  which  are  taken  from  the  Kodak  image  database.  The  size  of  the  images  is  512  x  768, 
encoded  in  RGB  with  8  bits  per  channel.  We  simulated  the  mosaic  effects  using  the  Bayer  pattern, 
present  in  Table  III  the  results  in  terms  of  PSNR  using  a  globally  trained  dictionary  with  atoms 
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(a)  Original  Image  (b)  Damaged  Image  (c)  Restored  Image 

Fig.  10.  From  left  to  right  :  The  original  image,  the  image  with  80%  of  data  removed,  the  result  of  our  inpainting  using 
7x7x3  patches.  The  restored  image  PSNR  is  29.36  dB. 


(a)  Original  image 

Fig.  11.  Inpainting  for  text  removal. 


(b)  Image  with  text 


(c)  Restored  image 


of  size  5x5x3,  with  a  patch- sparsity  factor  L  =  20,  and  compare  to  bilinear  interpolation,  the 
results  given  by  Kimmel’s  algorithm  [48],  three  very  recent  methods  and  state-of-the-art  results 
presented  in  [46], 

For  some  very  difficult  regions  of  some  images,  our  generic  color  image  restoration  method, 
when  applied  to  demosaicing,  does  not  give  better  results  than  the  best  interpolation-based 
methods,  which  are  tuned  to  track  the  classical  artifacts  from  the  demosaicing  problem.  On  the 
other  hand,  on  average,  our  algorithm  performs  as  well  and  sometimes  better  than  state-of-the- 
art,  improving  by  0. 1 1  dB  the  average  result  on  this  standard  dataset  when  compared  to  the  best 
demosaicing  algorithm  so  far  reported.  The  fact  that  we  did  not  introduce  any  special  procedure 
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Im 

BI 

K  [48] 

AP  [47] 

OR  [49] 

SA  [52] 

CC  [46] 

D1 

D1L 

D2 

D2L 

1 

26.21 

24.64 

37.70 

34.66 

38.32 

38.53 

37.17 

38.19 

37.77 

38.97 

2 

33.08 

32.29 

39.57 

39.22 

39.95 

40.43 

38.06 

39.68 

39.59 

40.81 

3 

34.45 

33.84 

41.45 

41.18 

41.18 

42.54 

40.89 

42.94 

41.51 

43.19 

4 

33.46 

33.45 

40.03 

38.56 

39.55 

40.50 

39.93 

41.46 

40.03 

40.76 

5 

26.51 

31.89 

37.46 

35.25 

36.48 

37.89 

36.99 

38.05 

37.77 

38.24 

6 

27.66 

35.58 

38.50 

31.04 

39.08 

40.03 

37.69 

39.14 

38.32 

39.61 

7 

33.38 

36.36 

41.77 

41.39 

41.50 

42.15 

40.87 

42.53 

41.70 

42.79 

8 

23.39 

33.23 

35.08 

31.04 

35.87 

36.41 

34.89 

35.58 

35.30 

35.94 

9 

31.16 

39.36 

41.72 

41.27 

41.94 

43.04 

41.82 

42.50 

42.39 

42.96 

10 

32.38 

39.22 

42.02 

40.39 

41.80 

42.51 

41.65 

42.11 

41.95 

42.50 

11 

28.96 

35.65 

39.14 

37.42 

38.92 

39.86 

38.58 

39.26 

38.97 

39.70 

12 

33.27 

39.37 

42.51 

42.30 

42.37 

43.45 

41.26 

42.93 

42.31 

43.42 

13 

23.79 

31.07 

34.30 

31.08 

34.91 

34.90 

33.62 

34.13 

34.57 

34.86 

14 

29.05 

32.50 

35.60 

35.58 

34.52 

36.88 

36.31 

37.37 

37.12 

37.87 

15 

33.04 

34.19 

39.53 

37.77 

38.97 

39.78 

38.42 

40.16 

38.29 

39.77 

16 

31.13 

38.12 

41.76 

41.82 

41.60 

43.64 

41.03 

42.33 

42.15 

43.15 

17 

31.80 

38.34 

41.11 

39.06 

40.97 

41.21 

40.59 

40.98 

41.25 

41.72 

18 

28.30 

32.75 

37.45 

35.28 

37.27 

37.49 

36.13 

36.67 

36.73 

37.21 

19 

27.84 

36.50 

39.46 

38.06 

39.96 

41.00 

38.36 

39.98 

38.37 

40.45 

20 

31.51 

37.63 

40.66 

39.05 

40.51 

41.07 

39.29 

39.98 

40.11 

40.86 

21 

28.38 

36.07 

38.66 

36.22 

38.93 

39.12 

38.23 

38.58 

38.82 

39.27 

22 

30.14 

36.00 

37.55 

36.49 

37.67 

37.97 

37.61 

38.21 

38.02 

38.56 

23 

34.83 

34.01 

41.88 

41.34 

41.79 

42.89 

41.67 

42.72 

41.76 

42.96 

24 

26.83 

32.95 

34.78 

32.49 

34.82 

35.04 

34.27 

35.14 

34.54 

35.44 

Avg. 

30.06 

35.21 

39.15 

37.70 

39.12 

39.93 

38.55 

39.61 

39.14 

40.04 

TABLE  III 


Comparison  of  the  PSNR,  in  dB,  for  different  demosaicing  algorithms  on  the  Kodak  data  set.  Some  results  are  taken  from  the 
paper  [46].  BI  refers  to  a  simple  bilinear  interpolation,  then,  K,  AP,  OR,  SA,  CC  refer  to  the  algorithms  respectively  from 
[48],  [47],  [49],  [52],  [46].  D1  refers  to  the  results  obtained  with  a  globally  trained  dictionary  (200  iterations  with  L  =  6  on 
200000  different  5x5x3  patches),  D2  refers  to  a  better  trained  dictionary  (600  iterations  with  L  —  6  on  200000  different 
5x5x3  patches ).  Then,  D1L  and  D2L  refer  to  the  result  obtained  with  these  dictionaries  and  the  learning  process  we 
already  described,  with  two  times  20  learning  iterations  with  a  patch- sparsity  equal  to  15.  Bold  indicates  the  best  results  for 
each  image,  and  these  are  mostly  shared  between  our  algorithm  and  the  one  recently  reported  in  [46],  which  was  explicitly 

designed  for  handling  demosaicing  problems. 
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Fig.  12.  Kodak  image  database,  images  1  to  24  from  left  to  right  and  top  to  bottom,  so  that  the  upper  left  image  is  nl,  and 
the  upper  right  image  is  n6 


to  address  the  classical  demosaicing  artifacts  and  still  achieve  state-of-the-art  results,  clearly 
indicates  the  power  of  our  framework.  Some  visual  results  are  presented  in  Figure  13. 

VI.  Discussion  and  concluding  remarks 

In  this  paper  we  introduced  a  framework  for  color  image  restoration,  and  presented  results  for 
color  image  denoising,  inpainting,  and  demosaicing.  The  framework  is  based  on  learning  models 
for  sparse  color  image  representation.  The  gray-scale  K-SVD  algorithm  introduced  in  [3],  [2] 
proved  to  be  robust  toward  the  dimensionality  increase  resulting  from  the  use  of  color.  Following 
the  extensions  here  introduced,  this  algorithm  learns  some  correlation  between  the  different  R,G,B 
channels  and  provides  noticeably  better  results  than  when  modeling  each  channel  separately. 

Although  we  already  obtain  state-of-the-art  results  with  the  framework  presented  here,  there  is 
still  room  for  future  improvement.  The  artifacts  found  in  images  modeled  with  small  patches  are 
often  different  from  those  we  observed  on  images  modeled  with  larger  patches.  Large  patches 
introduce  a  smoothing  effect,  giving  very  good  results  on  flat  areas,  whereas  some  details  could 
be  blurred.  On  the  other  hand,  small  patches  are  very  good  at  retrieving  details,  whereas  they 
might  introduce  artifacts  in  flat  areas.  This  motivates  us  to  consider  a  multiscale  version  of  the 
K-SVD,  and  a  multiscale  learning  for  image  modeling  in  general. 

We  present  here  a  preliminary  result  in  this  multiscale  direction,  working  with  just  two  different 
patch  sizes.  The  idea  consists  of  performing  a  weighted  average  between  the  result  obtained  with 
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(e)  Image  7  restored 


(f)  Zoomed  region 


(g)  Image  14  restored 


(h)  Zoomed  region 


Fig.  13.  Examples  of  demos aicing  results.  The  image  13(a)  is  one  where  we  do  not  perform  (visually)  as  well  as  the  best 
interpolation-based  algorithm.  On  the  other  ones,  Figures  13(e),  13(g)  and  13(c),  we  outperform  these  algorithms. 


these  sizes.  The  weights  are  derived  from  the  sparsity  of  each  pixel  in  the  reconstructed  images. 
We  define  the  pixel-sparsity  of  one  pixel  in  the  reconstructed  image  as  the  average  sparsity  of 
each  patch  which  contains  this  pixel. 

We  consider  now  gray-scale  images  to  simplify  the  notations.  Assume  we  denoise  the  image 
y  =  x  +  w,  following  our  proposed  framework,  with  one  patch  size  s,  obtaining  xs,  and  then 
with  another  patch  size  b,  obtaining  x6.  We  want  to  find  the  optimal  set  of  weights  A.y-  for  each 
pixel  ij,  without  knowing  the  original  image  x,  such  that 

\i:j  =  argomin  |Xij  -  (Ax£  +  (1  -  A)x^)|. 

In  order  to  learn  the  relationship  between  the  pixel-sparsity  and  the  optimal  weights,  we  use  a 
Support  Vector  Regression  algorithm  with  a  linear  kernel  [53].  The  basic  algorithm  is  presented 
on  Figure  14.  We  observed  some  improvements  with  this  two-scale  dictionary,  as  presented  in 
Table  IV,  encouraging  our  ongoing  research  on  learning  multiscale  image  models  and  multiscale 
sparsity  frameworks. 
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Input:  y  (noisy  image);  a  (deviation  of  the  noise);  F  (generic  data-set  of  different 
images). 

Initialization  of  the  database: 

•  Creation  of  the  noisy  database:  Create  T,  the  noisy  version  of  T,  where  one  adds 
a  noise  of  deviation  o  to  each  image  of  T  in  order  to  create  the  learning  database. 

•  Denoising  the  learning  database  with  small  patches:  Denoise  F  using  patches  of 

size  s  x  s.  This  gives  the  reconstructed  images  T*  and  the  sparsity  of  each  pixel, 

T  s 
lt. 

•  Denoising  the  learning  database  with  large  patches:  Denoise  T  using  patches  of 
size  b  x  b.  This  gives  the  reconstructed  images  Tb  and  L[. 

•  Computation  of  the  optimal  weights:  For  each  pixel  (i,j)  of  each  image  m  of  T, 
denoted  by  Fy-m,  compute  the  optimal  weight: 

\jm  —  arg  in f  I  t'ijm  ~  "F  (1  —  F)r,J?rJ  |  ■ 

Learning  the  rule  between  sparsity  and  optimal  weights:  Train  a  Support  Vector 
Regression  using  Lf  and  Lb  as  inputs  and  the  \iym  as  outputs,  or  eventually  one  part 
of  this  training  set. 

Denoising  process: 

•  With  small  patches:  Denoise  y  using  patches  of  size  s  x  s.  This  gives  the 
reconstructed  images  xs  and  the  sparsity  of  each  pixel,  Ls . 

•  With  large  patches:  Denoise  y  using  patches  of  size  b  x  b.  This  gives  the 
reconstructed  images  x6  and  Lb. 

•  Estimation  of  the  optimal  weights:  Use  the  Support  Vector  Regression  to  estimate 
the  optimal  weights  for  x  using  Ls  and  Lb  as  inputs.  This  gives  a  matrix  A. 

•  Weighted  averaging:  The  output  of  the  algorithm  is  then  the  weighted  average 
between  xs  and  x6,  with  A  providing  the  weights. 


Fig.  14.  K-SVD  algorithm  for  denoising  with  two  sizes  of  patches. 
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Training  Image 

Noisy  image 

Small  size/PSNR 

Big  size/PSNR 

Result  with  SYR 

Train 

Castle 

5x5x3/  30.74 

8x8x3/  31.11 

31.22 

Castle 

Train 

5x5x3/  28.04 

8x8x3/  28.06 

28.24 

Mushroom 

Castle 

5x5x3/  30.65 

8x8x3/  31.12 

31.20 

Castle 

Mushroom 

5x5x3/  30.10 

8x8x3/  30.30 

30.37 

Lena 

Boat 

5x5x1/  28.42 

10  x  10  x  1  /  29.37 

29.43 

Boat 

Lena 

5x5x1/  29.28 

10  x  10  x  1  /  31.40 

31.41 

TABLE  IV 


Preliminary  results  when  working  with  two  sizes  of  patches.  The  SVR  is  trained  on  the  image  of  the  first  column,  then  it  is 
used  to  denoise  the  image  on  the  second  column,  which  has  been  noised  with  a  white  Gaussian  noise  of  standard  deviation 
a  —  25.  Third  and  fourth  columns  indicate,  respectively,  the  PSNR  result  when  denoising  with  5x5x3  and  8x8x3  atoms. 
The  last  column  presents  the  PSNR  result  for  the  joint  two-patches  reconstruction. 


Concerning  the  computational  complexity,  the  trade-off  between  speed  and  quality  can  be 
chosen  by  the  user.  Using  global  dictionaries  provides  fast  results.  Inpainting/demosaicing  can 
be  performed  with  a  few  step  in  the  OMP  (parameter  L ).  One  great  advantage  of  the  K-SVD 
algorithm  which  will  be  more  and  more  important  in  the  future  is  that  it  can  easily  be  parallelized. 
Nowadays  processors  are  not  progressing  a  lot  in  the  number  of  sequential  operations  per  second 
but  are  multiplying  the  number  of  cores  inside  each  chip.  Therefore,  it  has  become  very  important 
to  design  algorithms  which  can  be  parallelized.  In  our  case,  the  OMP  can  be  performed  with  one 
patch  per  processor  with  maximal  efficiency.  For  the  learning  step,  some  libraries  like  ARPACK 
provide  incomplete  Singular  Value  Decomposition  for  parallel  machines  and  it  is  possible  to 
perform  the  learning  of  two  atoms  in  parallel  if  the  sets  of  patches  which  use  them  are  not 
overlapping.  One  strategy  to  further  improve  complexity  consists  of  reducing  the  overlapping  of 
the  patches  and  using  only  one  portion  of  the  patches  when  working  with  large  images.  With 
patches  of  size  8x8  and  gray-scale  images  512  x  512,  a  one  fourth  reduction  proved  not  to 
affect  the  quality  of  the  reconstruction  [2],  In  our  examples,  we  chose  to  always  use  a  complete 
set  of  overlapping  patches,  but  if  one  wants  to  use  high  definition  images  with  more  than  one 
million  of  pixels,  this  would  be  problematic.  This  is  an  additional  motivation  for  our  current 
efforts  on  multiscale  frameworks. 
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We  chose  to  work  with  natural  color  images  because  of  their  practical  relevance  and  because 
it  is  a  very  convenient  data  source  in  order  to  compare  our  algorithm  with  other  results  available 
in  the  literature.  One  main  characteristic  we  have  not  discussed  in  this  paper  is  the  capability  of 
this  algorithm  to  be  adapted  to  other  types  of  vectorial  data.  Future  work  will  consist  of  testing 
the  algorithm  on  multi-channels  images  such  as  LANDSAT.  Another  interesting  future  direction 
consists  of  learning  dictionaries  for  specific  classes  of  images  such  as  MRI  and  astronomical 
data.  We  strongly  believe  that  this  framework  could  provide  cutting  edge  results  in  modeling 
this  kind  of  data  as  well. 
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